Abstract
We present a rigorous decomposition of how the DBSCAN clustering algorithm's behavior changes as the minimum points parameter (minPts) increases while holding the neighborhood radius $\varepsilon$ fixed. Through two main theorems, we establish: (1) an exact matching-based decomposition of the change in total inter-cluster center separation into motion and topological components, and (2) a precise formula for centroid shifts under point removal ("peeling"). These results provide a mathematical framework for understanding the dual mechanisms—boundary erosion and cluster birth/death events—that govern DBSCAN's sensitivity to parameter choices.
Fix $\varepsilon>0$ and a dataset $X=\{x_1,\dots,x_n\}\subset\mathbb{R}^3$. For each integer $m\in\{1,\dots,n\}$, run DBSCAN with parameters $(\varepsilon,\text{minPts}=m)$. Let the non-noise clusters be
$$ \mathcal{C}(m)=\{C_1^{(m)},\dots,C_{k(m)}^{(m)}\}, $$and define each cluster center as the mean
$$ \mu_i^{(m)}:=\frac{1}{|C_i^{(m)}|}\sum_{x\in C_i^{(m)}} x\in\mathbb{R}^3. $$Define the total inter-center separation
$$ S(m):=\sum_{1\le iTheorem 1: Matching Decomposition of $\Delta S(m)$ into Motion vs. Topology
Let $I=\{1,\dots,k(m)\}$ index clusters at $m$, and $J=\{1,\dots,k(m+1)\}$ index clusters at $m+1$. Form the cost matrix
$$ c_{ij}:=\big|\mu_i^{(m)}-\mu_j^{(m+1)}\big|\qquad (i\in I,\;j\in J). $$Let $M\subset I\times J$ be a minimum-cost one-to-one matching between cluster centers at $m$ and at $m+1$ (e.g. the Hungarian algorithm). Define the matched ("survivor") index sets
$$ I_M:=\{i\in I:\exists j,(i,j)\in M\},\qquad J_M:=\{j\in J:\exists i,(i,j)\in M\}, $$and the unmatched sets
$$ I_D:=I\setminus I_M \quad(\text{deaths}),\qquad J_B:=J\setminus J_M \quad(\text{births}). $$For each $i\in I_M$, let $\pi(i)\in J_M$ denote its match, and define the center shift
$$ \delta_i := \mu_{\pi(i)}^{(m+1)}-\mu_i^{(m)}. $$Then $\Delta S(m)$ decomposes exactly as
where
$$ S_{MM}^{\text{old}}=\sum_{i<\ell, i,\ell\in I_M}|\mu_i^{(m)}-\mu_\ell^{(m)}|, $$ $$ S_{MM}^{\text{new}}=\sum_{i<\ell, i,\ell\in I_M}|\mu_{\pi(i)}^{(m+1)}-\mu_{\pi(\ell)}^{(m+1)}|, $$ $$ S_{MB}^{\text{new}}=\sum_{j\in J_B}\sum_{i\in I_M}|\mu_j^{(m+1)}-\mu_{\pi(i)}^{(m+1)}|, $$ $$ S_{BB}^{\text{new}}=\sum_{j<\ell, j,\ell\in J_B}|\mu_j^{(m+1)}-\mu_\ell^{(m+1)}|, $$ $$ S_{MD}^{\text{old}}=\sum_{d\in I_D}\sum_{i\in I_M}|\mu_d^{(m)}-\mu_i^{(m)}|, $$ $$ S_{DD}^{\text{old}}=\sum_{d<e, d,e\in I_D}|\mu_d^{(m)}-\mu_e^{(m)}|. $$Moreover, the motion term is controlled by the shifts:
- Motion: the same clusters' centers moved (peeling, reassignments, shrinkage).
- Topology: clusters appeared/disappeared under the one-to-one matching (splits show up as one survivor plus extra births; vanishing shows up as deaths).
Likewise, partition $J=J_M\sqcup J_B$. Using the matching bijection $i\mapsto \pi(i)$ between $I_M$ and $J_M$, the sum defining $S(m+1)$ splits as
$$ S(m+1)=S_{MM}^{\text{new}}+S_{MB}^{\text{new}}+S_{BB}^{\text{new}}. $$Subtracting yields the displayed exact decomposition.
For the bound, fix matched indices $i,\ell\in I_M$ and write
$$ \mu_{\pi(i)}^{(m+1)}=\mu_i^{(m)}+\delta_i,\qquad \mu_{\pi(\ell)}^{(m+1)}=\mu_\ell^{(m)}+\delta_\ell. $$Then the norm is 1-Lipschitz:
$$ \Big||(\mu_i-\mu_\ell)+(\delta_i-\delta_\ell)|-|\mu_i-\mu_\ell|\Big| \le |\delta_i-\delta_\ell| \le |\delta_i|+|\delta_\ell|. $$Summing over all $\binom{|I_M|}{2}$ pairs counts each $|\delta_i|$ exactly $(|I_M|-1)$ times, giving
$$ \big|S_{MM}^{\text{new}}-S_{MM}^{\text{old}}\big|\le (|I_M|-1)\sum_{i\in I_M}|\delta_i|. $$ □Theorem 2: Exact Centroid Shift Under "Peeling" When minPts Increases
Consider a matched cluster $i\in I_M$. Suppose that raising minPts from $m$ to $m+1$ acts on this cluster by removing a subset of its points (turning them into noise or assigning them elsewhere) while keeping the rest, i.e.
$$ C_i^{(m)} = A_i \sqcup R_i,\qquad C_{\pi(i)}^{(m+1)} = A_i, $$where $A_i$ are the kept points and $R_i$ are the removed ("peeled") points, disjoint.
Let $\mu_{A_i}$ and $\mu_{R_i}$ denote the means of $A_i$ and $R_i$. Then the center shift $\delta_i=\mu_{\pi(i)}^{(m+1)}-\mu_i^{(m)}$ satisfies the exact identity
In particular,
- how much mass was removed ($|R_i|/|A_i|$), and
- how far the removed mass sat from the old center ($|\mu_i^{(m)}-\mu_{R_i}|$).
The new matched center is $\mu_{\pi(i)}^{(m+1)}=\mu_{A_i}$. Therefore
$$ \delta_i=\mu_{A_i}-\frac{|A_i|\mu_{A_i}+|R_i|\mu_{R_i}}{|A_i|+|R_i|} =\frac{|R_i|}{|A_i|+|R_i|}\,(\mu_{A_i}-\mu_{R_i}). $$Now eliminate $\mu_{A_i}$ in favor of $\mu_i^{(m)}$ by rearranging the convex combination:
$$ \mu_i^{(m)}-\mu_{R_i}=\frac{|A_i|}{|A_i|+|R_i|}(\mu_{A_i}-\mu_{R_i}). $$Substitute into the previous line to obtain
$$ \delta_i=\frac{|R_i|}{|A_i|}\,(\mu_i^{(m)}-\mu_{R_i}), $$and taking norms yields the last identity. □
How the Two Theorems Fit Together: Grounding Intuition
At fixed $\varepsilon$, increasing minPts changes $S(m)$ through exactly the two channels in Theorem 1:
- Matched-center motion is governed by $\sum|\delta_i|$. Theorem 2 makes $\delta_i$ concrete: it is driven by how much boundary mass is peeled and where that mass lies.
- Birth/death topology events dominate when clusters split or vanish. In the matching framework, a split typically appears as one matched survivor plus additional "birth" centers; a vanishing appears as a "death." These events can change $S(m)$ even if the surviving centers barely move.
This gives a crisp diagnostic: if $S(m)$ jumps but the $\delta_i$ are small, the jump must be largely topological (births/deaths). If $S(m)$ drifts smoothly with no births/deaths, it's largely peeling-driven center motion.
Conclusion
We have presented a rigorous mathematical framework for understanding how DBSCAN's behavior evolves as the minPts parameter increases at fixed $\varepsilon$. The matching-based decomposition (Theorem 1) separates continuous center motion from discrete topological events, while the peeling formula (Theorem 2) quantifies the mechanism by which boundary erosion drives centroid shifts. Together, these results provide both theoretical insight and practical diagnostic tools for parameter selection and cluster stability analysis in density-based clustering.